Efficient multivariate sequence classification
نویسنده
چکیده
Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results. Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subsequence kernels) are restricted to discrete univariate (i.e. one-dimensional) string data, such as sequences of words in the text analysis, codeword sequences in the image analysis, or nucleotide or amino acid sequences in the DNA and protein sequence analysis. However, original sequence data are often of real-valued multivariate nature, i.e. are not univariate and discrete as required by typical k-mer based sequence kernel functions. In this work, we consider the problem of the multivariate sequence classification (e.g., classification of multivariate music sequences, or multidimensional protein sequence representations). To this end, we extend univariate kernel functions typically used in sequence domains and propose efficient multivariate similarity kernel method (MVDFQ-SK) based on (1) a direct feature quantization (DFQ) of each sequence dimension in the original real-valued multivariate sequences and (2) applying novel multivariate discrete kernel measures on these multivariate discrete DFQ sequence representations to more accurately capture similarity relationships among sequences and improve classification performance. Experiments using the proposed MVDFQ-SK kernel method show excellent classification performance on three challenging music classification tasks as well as protein sequence classification with significant 25-40% improvements over univariate kernel methods and existing state-of-the-art sequence classification methods.
منابع مشابه
Efficient multivariate kernels for sequence classification
Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results. Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subseq...
متن کاملSeismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task
In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...
متن کاملMining and Classification of Multivariate Sequential Data
Multivariate sequence mining and classification are important and challenging tasks. They can be applied to numerous domains including medical diagnosis, handwriting deficiency diagnosis, identification of users for security or personalized TV services, and even transportation and traffic planning. The problem we address in this dissertation is classification of multivariate sequences. Multivar...
متن کاملMultivariate Statistical Analysis Decision-making Hybrid Method for Road Traffic Safety Evaluation in Iran
Obviously, improving the road safety and the efficient allocation of limited resources to the provinces according to their ranking should be done. This paper presents a hybrid method of multivariate statistical analysis-decision making to evaluate Iran road traffic safety. In order to solve the problems of road traffic safety, a macroscopic evaluation and traffic safety level classification in ...
متن کاملDirections for computing truncated multivariate Taylor series
Efficient recurrence relations for computing arbitrary-order Taylor coefficients for any univariate function can be directly applied to a function of n variables by fixing a direction in Rn. After a sequence of directions, the multivariate Taylor coefficients or partial derivatives can be reconstructed or “interpolated”. The sequence of univariate calculations is more efficient than multivariat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1409.8211 شماره
صفحات -
تاریخ انتشار 2014